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BAYESIAN  COMPUTATIONS  IN  SURVIVAL  MODELS  VIA  THE  GIBBS  SAMPLER 


LYNN  KUO,  ADRIAN  F.    M.    SMITH, 

Department  of  Statistics,  Department  of  Mathematics, 

University  of  Connecticut,  Imperial  College  London, 

U.    S.    A.  United  Kingdom. 


ABSTRACT.  Survival  models  used  in  biomedical  and  reliability  contexts 
typically  involve  data  censoring,  and  may  also  involve  constraints  in 
the  form  of   ordered      parameters.  In      addition,       inferential      interest 

often  focuses  on  non-linear  functions  of  natural  model  parameters. 
From  a  Bayesian  statistical  analysis  perspective,  these  features 
combine  to  create  difficult  computational  problems  by  seeming  to 
require  (multi-dimensional)  numerical  integrals  over  awkwardly  defined 
regions.  This  paper  illustrates  how  these  apparent  difficulties  can  be 
overcome,  in  both  parametric'  and  non-parametric  settings,  by  the  Gibbs 
sampler  approach  to  Bayesian  computation. 


1 .  Introduction 

For     the     Bayesian    statistical    analysis     of     other     than    simple     stylized 

models,     the    key    tool  for    calculation     is     (multi-dimensional)     numerical 

integration;     see,     for  example,     Smith    et     al     (1987)     for     a    review    of 

available      techniques.  However,        it       is       widely       recognized       that 

considerable    numerical  sophistication    is    typically    required    in    applying 

these    techniques,     and  that    this    has    thus    far    hampered    the    development 

of  routinely  available,  user-friendly,    Bayesian  computational  methods. 

This  is  particularly  true  in  the  case  of  survival  models  used  in 
biomedical  and  reliability  contexts.  Here,  features  such  as  data 
censoring,  ordered  parameters,  assumed  convexity  or  concavity  of  dis- 
tributions, all  conspire  to  produce  complicatedly  constrained  regions 
over  which  numerical  integrations  are  required.  Not  surprisingly,  the 
literature  therefore  contains  very  few  instances  of  fully  Bayesian 
analyses  in  survival  contexts  (i.e.,  presenting  full  and  accurate 
posterior  summaries,  rather  than,  say  modal  point  estimates  or  second 
derivative  uncertainty  measures). 

Recently,  however,  Gelfand  et  al  (1991)  have  shown  that  the  Gibbs 
sampler  approach  to  Bayesian  computation  (see,  for  example,  Gelfand  and 
Smith,      1990,      and     Gelfand     et     al,      1990)      effectively     side-steps     the 


seeming  problems  of  awkwardly  defined  integration  regions  in  truncated 
data  and  constrained  parameter  problems,  and  provides  an  easily 
implemented  computational  procedure. 

Our  purpose  in  this  paper  is  to  illustrate  the  simplicity  and  scope  of 
the  Gibbs  sampler  for  the  routine  Bayesian  analysis  of  survival  data, 
in  both  parametric  and  non-parametric  settings.  In  Section  2,  we 
briefly  review  the  Gibbs  sampler  and  its  general  structure  for 
constrained  parameter  and  censored  data  problems.  In  Section  3,  we 
provide  a  range  of  illustrations  of  how  the  methodology  proceeds  for  a 
variety  of  parametric  models  used  in  various  survival  modelling 
contexts.  In  Section  4,  we  give  a  non-parametric  illustration  of  the 
methodology. 


2.  The  Gibbs  Sampler,    Constraints  and  Censoring 

2.1  THE  GIBBS  SAMPLER 

In     what     follows,      densities     will     be     denoted     generically     by     square 
brackets,        so       that       Joint,        conditional       and       marginal       forms       for 
random    variables    U,    V   appear,    respectively,    as    [U,    V],     [U\V]    and    [V]. 
Marginalization   by    integration    is   denoted   by    [U]    =   SiU\V]»[V].       Given  a 
collection  of  random  variables  with  Joint  density  [U  ,    (/_,    ...,U.],      we 

shall  refer  to   [U   \U  ,    r*s],    s  =  1,    2 k,   as     the     full     conditional 

densities. 

The  Gibbs  sampler  is  a  simply  described  iterative  stochastic  simulation 
scheme,  whereby  samples  drawn  from  the  full  conditional  densities  are 
used    to    provide    summaries    of    aspects    of    the    Joint    density.        Given    an 

arbitrary  set  of  starting  values,    1/7 IT,    a     random     variate     U        is 

drawn  from   f^J^ t/7 1 .    then  a  variate  t/.      is     drawn     from      I^l^i    » 

l/^,...,u£],    and  so  on  until  I/*   is  drawn     from      [ C/   1 1/^ [/*     ].        This 

completes  one   iteration  of   the      sampler     and      results      in      a      generated 

vector   {U.,  .  . .  ,U.) .      Repeating  this      process,      after      i      iterations      we 

11 
arrive  at  a  generated  vector   (U., .  . .  ,U,).       It  can  be  shown  that,      under 

mild  regularity  conditions  (see,  for  example,  Geman  and  Geman,  1984), 
as   i-»»  this  random  vector  tends    in     distribution      to      a     random      vector 

having  the  Joint  distribution  [U U  ]. 

One  possible  procedure  for  obtaining  summaries  of  aspects  of 
[Uy...,U.]  of  interest  is  therefore  the  following.  Run  M  independent 
parallel    replications    of    the    above    sampling    procedure,     so    that,     for    I 


Judged     to     be    sufficiently         large,         the    resulting     generated     vectors 

(1/     U.  .)    ,    j  =    1,2,  ...,M   ,    can   be   regarded   as      an      lid   sample   of 

size     M     from      [U ., . . .  ,U .).  Standard     moment,      quantile     or     density 

estimation  techniques  can  then  be  employed  to  estimate  summary  features 
of  interest. 

In  the  Bayesian  inference  context,  [U  , .  .  .U.]  is  the  Joint  posterior 
density  of  unknown  model  parameters  U  , .  .  .  ,U  .  Univariate  marginal 
inference       summaries       for       U  ,        say,        are       simply       obtained       from 

W  ,...,U  ).  Generally,  if  marginal  inference  summaries  are  required 
for  a  specified  function  of  (U  , .  .  .  ,U  ),  an  iid  sample  of  size  M  from 
the  corresponding  marginal  density  is  immediately  available  on  substi- 
tuting   the   generated   vectors       (U     U.  .),       J  ■    1,2 M,  into    the 

*  J  kj 

functional   form.    Exploration      of      bivariate       (or  higher      dimensional) 

marginal   summaries    proceeds    in   an   obvious    manner.  For   further      detail, 

and  comments   on  the  pragmatics   of  choices   of   i  and  M,    see  Gelfand      and 
Smith   (1990,    1991),    Gelfand  et  al   (1990,    1991). 

The  Gibbs  sampler  thus  provides  a  simulation-based  alternative  to 
direct  numerical  integration  methods,  and  one  which  depends  only  on  our 
capacity  to  generate  random  variates  (reasonably  efficiently)  from  the 
full    conditional    densities,     [U   \U  ,     r*s).        We    shall    now    look    at    this 

latter  issue  in  the  context  of  constrained  parameter  and  censored  data 
problems.  Our  discussion  here  will  be  kept  to  the  minimum  necessary  to 
give  the  reader  an  appreciation  of  how  the  Gibbs  sampler  achieves 
crucial  simplification.  For  a  much  more  complete  discussion,  see 
Gelfand  et  ai,    (1991). 

2.2  MODELS  WITH  CONSTRAINED  PARAMETERS 


Suppose     a     parametric     model     for     data     Y     involves      a     Jc-dimensional 

parameter    vector    6,     constrained    to     lie     in    a    subset    s      of    R   .         For 

simplicity  of  exposition,  we  shall  assume  here  that  Si  does  not  depend 
on  y  (as  would  be  the  case,  for  example,  if  some  components  of  8  were 
truncation  parameters:  see  Gelfand  et  ai,  1991).  Suppose  further  that 
[y|0],  [0]  denote  the  (unconstrained)  forms  of  likelihood  and  prior,  so 
that  the   (constrained)   Joint  posterior  for  G  ■   (8.,..., 6.)    is  given  by 


V 


Proceeding  by  direct  numerical  integration,  we  see  that  there  is  an 
immediate  problem  in  calculating  the  normalizing  constant  and  subse- 
quent   problems    in    performing    (k-1) -dimensional    integrals    (over    subsets 

of  Sr)   to  obtain  marginal  density  forms. 

However,  consider  now  the  full  conditional  forms  required  for  the  Gibbs 
sampler.  If  57(9,.  J*-0  denotes  the  cross-section  of  ST  corresponding 
to  the  constraints  on  6.  imposed  by  Sr  for  specified  values  of  6.,  j*it 
we  have 

[ejy,  e  ,  j*i]    «    [y|e]»[e]     ,    e^s^e  ,  j*i)     . 

Moreover,  the  constraints  region  for  6.  will  typically  be  an  interval, 
or  a  union  of  intervals. 

It  follows  that  the  typical  random  variate  generation  task  required  for 
the  Gibbs  sampler  in  this  case,  will  simply  be  that  of  generating  from 
specified  univariate  density  shapes  truncated  to  intervals.  This  is  a 
relatively  straightforward  task:  in  any  case,  strikingly  easier  than 
high- dimensional  numerical  integration  over  complicated  constraint 
volumes.  '    , 

2.3  CENSORED  DATA  PROBLEMS 

Suppose        a    parametric    model    for    data    y    ■     (y ,Y  )         involves     a 

1  n 

k-dimensional  parameter  vector  6,    with  likelihood  defined  by 

[y|e]    =    n  [y  |e]    . 
i 

However,  suppose  that  for  j  *  m  >  1  there  exist  V .,  V .  such  that, 
instead  of  observing  Y.  exactly,  we  simply  observe  that  y.  €  [V .,  V .) , 
so  that  the  likelihood  is  actually  given  by 

n-1  n  J 


n    [y  |e]  •    n  [y  |e] 

i=l  j=m    Jv ,    J 


j 

(We  are  here  assuming  a  simple,  fully  specified  censoring  process  for 
convenience  of  exposition.  For  a  more  general  discussion,  see  Gelfand 
et  al,    1991). 

In  this  case,  a  moment's  reflection  reveals  that  the  full  conditional 
forms  implied  by  the  above  likelihood  combined  with  a  prior  [G]  are 
not,      in     general,     easy     forms     to     sample     from.  In     particular,      the 


integral  terms  may  have  no  closed-form  analytic  expressions,  so  that 
standard  envelope  rejection  or  ratio-of  -uniforms  sampling  techniques 
are  not  readily  applicable. 

However,    suppose    we    consider    Y'    =    (Y Y  )    as    additional    unknowns, 

id  n 

so    that   the   unknown   model    parameters    are    (8,    Y'),    with    the    data   given 

by        (y\    V,    V),      where     Y*  =   (Y, Y     ,),    V  =   (V V  )   and     V  = 

1  in-1  id  n 

(V V  ).       Consider  now  the   full   conditionals   required   for   the   Gibbs 

m  n 

sampler: 

[OjYV    V,    V,    ef    j*i,    Y']      ,      i  =   1 k     , 

[Yr|Y*,    V,    V,    G,    Yg,    s*r,    s  *  m)      .      r  =  m n     . 

Careful    examination    of    the    conditioning    variables    reveals    that    the    full 

conditionals  for  6, ,  ...  ,6,    reduce  to 
1  k 

[B.\Y,    6Jt    j*l]      ,      i  =   1 k     , 

the  forms  that  would  have  obtained  in  the  uncensored  case!  Typically, 
these  forms  present  no  difficulty  for  random  variate  generation. 

For  the  full  conditionals  for  Y Y  ,  examination  of  the  forms 

jd  n 

reveals  that  these  reduce  to 

V 

r 


[Yr|e]     /  [Yr\6]      ,      r  =  m n     ; 


V 

r 

namely,  the  sampling  distributions  for  the  Y  restricted  to  the  ranges 
[V.,  V.).  Again,  these  typically  present  no  difficulty  for  random 
variate  generation. 

The  trick  of  treating  censored  observations  as  unknowns  in  combination 
with  the  Gibbs  sampler  leads  to  simple  Bayesian  calculation  strategies 
in  otherwise  intractable  problems  (see,  also,  Tanner  and  Wong,  1987, 
for  a  related  manifestation  of  the  idea).  In  the  next  section,  we 
illustrate  this  concretely  by  detailing  the  forms  of  the  Gibbs  sampler 
arising  in  a  range  of  parametric  models  used  in  various  kinds  of 
survival  studies. 


3.  Illustrations  For  Parametric  Survival  Models 

3.  1  ORDERED  BINOMIAL  PARAMETERS 

Consider    conditionally        independent    observations        Y  ~    Binomial        (n., 

6  ),    i  ■    1,2 k   ,    where   it   is   known  that      6     s  q     s.  .  .s  e.       and   we 

seek    to    make    inferences    about    the    8.    (or    functions,     thereof,     such    as 

9.  +1_e.  or    (9.  .-©.)    /  G.).    Problems   of      this      kind  arise,    for  example, 

in  reliability  development  testing  (Smith,  1977;  Fard  and  Dietrich, 
1987),  where  stages  l,...,k  correspond  to  successive  improvements  in 
reliability. 

If  the  Joint  prior  density  is  taken  to  be  proportional  to 

k 


n 


i=l 


e"'-1  (i-e/'"1 


over  the  simplex,      Sr  =   {(e 8   )  0  s  e.   s  Q     s. .  .s  6.    s    1},    by 

conjugacy  the  Joint  posterior  [e|y]  has  the  same  form,  with  support  &, 
but  with  a.,    3.  replaced  by  a.  +  Y  ,    ^    +  n.  -  Y .,    respectively. 

Implementation  of  the  Gibbs  sampler  is  now  seen  to  be  very  simple.  The 
full  conditionals  are  given  by 

[e.\Y,    6f    j*i]     =     Beta  (a>  Y v    p£  +  nf  -  YJ        .        i  =  1 k     , 

restricted   to      the      interval      6.   ,    se.se.,    (Grt    =    0,    6,    ,   =    1),    and 

l-l  i  l+l        0  k+1 

random  variate  generation  is  straightforward. 

3.2  CENSORED  REGRESSION  DATA 

Schmee  and  Hahn  (1979)  modelled  log-failure  times  of  motorettes  tested 
at  four  different  temperatures  by  a  straight-line  regression  of 
log-failure      versus       transformed       temperature.  Censoring      occurred 

whenever  a  motorette  had  not  failed  at  the  end  of  the  test  period.  The 
uncensored  case   likelihood   [Y|e]    is  assumed  to      derive     from     Y .     «  a  + 

pXi  *   €i/'    where  €ij  ~  Nf-°>      *}*       I  ■   1 k,       j  =    1 n^    but   the 

actual  data,    Z,    are  given  by 


zu 


Y  Y      s  V 

ij  ij         i 

if 

V.  Y..  >  V. 


where  V.   is  the  total  test  time  at  temperature  corresponding  to  X.. 

To    Implement   the  Gibbs   sampler,    as    indicated    in   Section   2.3,    we    include 
the   unobserved   Y . .    (i.e.,    those   where   Y .  .  >    V)    as   further   unknowns    in 

the   model,    in  addition   to   the   basic   parameters   of    interest,    a,£   and   or  . 

Given  conjugate  normal   prior  forms   for  a,  3   and   an      inverse-gamma   prior 

2 

for    <r  ,     it    is    easily    verified    that    the    full    conditional    forms    for    a,  3 

2 

and    or     are   straightforwardly    identified    conjugate    forms    (normal,     normal 

the     y .  .    were 
ij 
precisely    observed.        The    full    conditionals    for    the    unobserved    Y..    are 

2  •* 

simply  N(a  +   fiX.,    <r  ),    restricted   to    the   range   Y..   >    V ..       Again,    random 

variate  generation  from  all  these  full  conditionals   Is  unproblematic. 
3.3  TRUNCATED  BI VARIATE  NORMAL  DATA 


and     inverse-gamma,      respectively)     obtained     as   if     all 


,  n,    where  some   of 
One   context    In   which   such   data   arises    is    in 


Consider   a   blvariate   normal   process    (X.,    Y .) ,    i=    1, 

the   y.   are   not   observed. 

l 

paired  survival  time  studies  (using  observed  logarithms  of  survival 
times),  where  observation  (Y.)  of  the  second  of  the  paired  patients  is 

terminated  when  the  first  of  the  pair  dies,  so  that  y.  is  observed  only 

if  y.  s  x . 

i       i 

More  precisely,    we  assume  iid  pairs    (X.,    y.)   such  that  for  i  =   1 n, 


r  x.  i 

y. 

i 


~     N 


V 

■  V 

12 


12 


We    observe    the    pairs    (X.,     Z.)    with    Z. 


if    y     s    X.;     otherwise    we 


observe    (X.,    *),    where  Z.  =   *    indicates   that  Y.   >   X.. 
li  i  l 


Suppose   that  the 

prior   for    (G  ,    8_)    is   taken   to   be  bivariate   normal   with      mean    (ji  ,    n?) 
and   co variance   matrix   V,    and    that   the   prior   for   the   co variance   matrix, 


I,    say   is    taken   to   be   an    inverse-Wishart,    so    that 


[Z"1]    =    V  iipR)'1, 


p) ,    with  all  the  hyperparameters  p. ,    np,    V,   p  and  R  known. 


Interest    focuses 


on      marginal     inferences    for    G. ,     0_    and    Z,     but, 
.3,     unobserved    value 
unknowns    In   specifying    the   Gibbs    sampler. 


following    Section    2.3,     unobserved    values    of    Y      are    also     treated    as 


Defining   ^   =    (X  ,    Y  ).    T 


IT. T  ),    T  =  n'^T,   +..+  T  ),    e  =   (G.,    G.)   and  *i  =   (*v    Mp).    it  is 

In  In  l        ^  id 

easily  verified  that 
[e|7,   Z,    I]  =  tfWn"1*:  +  V)"1  f  ♦  n^Ztn"1!  +  V)n,    (nZ_1  +  F-1)"1}      , 

Ie'V,  z,  e]  =  i/{(z(r   -  e)(r   -  e)'  *  pR)'1,  n  +  P> 

I 

and 

[y.|xr  z..  e.  E]  =  N<e2  +  *12«f  (x,  -  e^.  •*  -  o-^2}     , 

truncated  to  (X.,  »)  if  Z.  =  *,  with  y.  degenerate  at  Z.  otherwise. 
The  required  random  variate  generation  is  routine. 


3.4 


WEIBULL  PROPORTIONAL  HAZARDS  WITH  CENSORING 


Consider   a   survival    time    model    in   which    the      hazard    function   X(t;    Z), 
for  an  individual  with  covariate  values  Z  at  time  t,  is  given  by 


X(t;    X)     =     pt 


P-1 


exp(Z0) 


where  0  =  (0  ,  0  ,  ...,0  )'  is  a  vector  of  unknown  regression  parameters 
and  p  >  0  is  the  unknown  Weibull  shape  parameter. 


If  t.,...,t     are  explicitly  observed  survival  times  and  t  ,  t     are 

In  *  *  n+1  m 

censored    (T    >    t)     lifetimes,     with    2 
jth  case,    the  likelihood  is  given  by 


censored    (T    >    t)    lifetimes,    with    Z.   denoting    covariate    values    for    the 


n 


p-l     z 

e 


/ 


n+m  ZB 

nexpf^e/5) 


Clearly,  whatever         the         prior         specification,  the         resulting 

(p     +     2) -dimensional     posterior     is     awkward     to     handle     using     standard 
numerical  integration  procedures. 

However,  it  is  easily  verified  that  the  second  partial  derivatives  of 
the  log- likelihood  with  respect  to  each  of  the  p  +  2  unknown  parameters 
are  all  non-positive  (see  Dellaportas  and  Smith,  1991).  If  the  prior 
density  is  chosen  to  be  log-concave,  it  follows  that  all  the  posterior 
full  conditionals  are  log-concave.  The  import  of  this  observation  is 
that  highly  efficient  methods  exist  for  random  variate  generation  from 
log-concave    densities    (see,     in   particular,    Gilks    and       Wild,        1991),    so 


that  routine,  straightforward  Bayesian  calculation  for  widely  used 
cases  of  proportional  hazards  models  is  possible  (see  Dellaportas  and 
Smith,    1991,    for  wider  exploitation  of  log-concavity). 


4.  A  Nonparametric  Illustration 

4. 1  INTRODUCTION 

Nonparametric  Bayesian  inference  for  the  survival  function  with  right 
censored  data  has  been  studied  by  Susarla  and  Van  Ryzin  (1976),  and 
Ferguson  and  Phadia  (1979).  However,  we  often  encounter  the  situation 
where  some  observations  are  censored  from  the  left  and  some 
observations  are  censored  from  the  right  (see  Turnbull,  1974,  for 
references  to  papers  addr*»ssing  doubly  censored  data  sets  from  a 
frequent  is  t  perspective). 

In  this  section,  we  study  a  nonparametric  Bayesian  approach  to  such 
problems,  which  allows  us  to  incorporate  prior  beliefs  and  frees  us 
from  making  a  restrictive  (parametric)  model  assumption  for  the 
survival  function.  Specifically,  we  assume  that  the  distribution 
function  F  of  survival  times  has  a  prior  given  by  Ferguson's  (1973) 
Dirichlet  process,    D(a).      The  measure  a  can  be  written  as   AfF_,    where  Fn 

is  the  prior  mean  of  F  and  F_(l  -F_)  /  (N  *  1)  is  the  prior  variance  of 
F.  The  larger  N,  the  more  strongly  the  prior  specifies  that  F  concen- 
trates around  Fn. 

In  the  doubly  censored  data  case,  it  is  very  difficult  to  obtain  an 
explicit  expression  for  non-parametric  Bayesian  estimators  even  in  the 
form  of  the  posterior  mean.  We  shall  show,  however,  that  the  Gibbs 
sampler  approach,  which  augments  the  data  by  using  latent  variables 
that  decompose  the  number  of  the  censored  observation  into  the  possible 
numbers  of  observations  falling  into  each  interval,  provides  a 
straightforwardly      computed      numerical      solution.  As      illustrated      in 

Section  2.3,  this  augmentation  facilitates  the  specification  of 
appropriate  full  conditional  densities,  particularly  here  for  the 
survival  functions  given  the  latent  variables.  The  iterated  sampling 
scheme  then  allows  us  to  approximate  the  posterior  distribution  of  the 
survival  function. 

4.2  THE  MODEL 

We  shall  illustrate  the  approach  using  a  model  similar  to  that  studied 
by  Turnbull  (1974),  who  proposed  a  self -consistent  algorithm  for 
computing  the  generalized  maximum  likelihood  estimators.  Here,  we  add 
the  Dirichlet  process  prior  to  the  model. 

Let   T, ,    T^ T     denote    the    true   survival    times    of    n    individuals    that 

12  n 


could    be   observed   precisely    if    no    censoring   were   present.       The   T.   are 

independent    and     identically    distributed    with    distribution    F;     that     is, 
Fit)    =   PiT   s    t)    for    t   fc   0.       We   consider   the   case   that   not   all   T.   are 

observed   precisely.       For  each   i,    we  assume   that   there  are    "windows"    of 
observations    Vr  and    V.    iV .    s    y.)    that    are    either    fixed    constants    or 

random  variables  independent  of  the  {T .} .      We  observe 

Xi  =  max  [   min(7\,    VJ,   V  ]      . 

Moreover,    for  each   item,    we  also   know  whether   it    is    left-censored      with 

X.  =  V.,    or  right-censored  with  X.  -  V.,    or  a  precisely      observed      time 

with  X.  =  T .. 
1  1 

We   assume    that    items    (or    patients)    are   examined    at   discrete    times    (for 

example,     monthly)    and    that    there    is    a    natural    discrete    time    scale    0    < 

t.     <    t_    <.  .  .  ,     t  ,     with    observed    deaths    classified    into    one    of    the    m 
id  m 

intervals    (0,     t.],(t.,     t_]...,(t     ,,     t   ].       Let   5.   denote   the   number   of 
l  id  m-i        ."i  i 

precise  observations    (  =  )    in   the   period    (t,  .,     t.],    p.   denote   the   number 

of    left-censored    (s)    entries  at  age   t .,    and  A.     denote      the      number      of 

right-censored    (>)    entries   at    t ..       It    is   assumed    that   the    left-censored 

l 

entries  jx.  all  occur  at  the  end  of  age  period    (t.,    t.  .].      The  data  can 
then  be  summarized  by  the  following  tabulation: 

Type  of  obs.  \  age   (0,  t. ]        (t  ,  t„)        ...    (t  ,,  t  ] 

w  «,  «2        ...         «„ 

(»  xt  x2    ...     xn 

Let  P.   =  P(t.)  =  1  -  Fit.)   denote  the  survival   function   evaluated   at 
t  .,  so  that  the  likelihood  function  is  proportional  to 

m 

Let   e     =   P         -    P        for      j   =    1,...,«  and    let   G     ,    =    P  .       The   prior 
j  j— l  j  in+i  n 

process    specifies     that    the    distribution    of    the    e's     is     the    Dirichlet 

distribution 


m 


m+1 
ir(e)     =     c 


n  <v j 


a  -1 


where 


for  J  =  1 a  +   1,    with  Fn(t     ,)  =  1,    and 

U       JD+l 


c    =  «« 


rC  "v 


The   posterior   distribution   of   G   =    (8,,    6_,...,8   ,8     , )    is    known    to    be 

\<L  w       nt+i 

a  mixture  of  Dirichlet  distributions  (see  Antoniak,  1974).  In  the  next 
section,  we  show  how  the  Gibbs  sampler  side-steps  the  need  for  direct 
computation  of  this  mixture. 

4.3  APPROXIMATION  VIA  THE  GIBBS  SAMPLER 

To     employ     the    Gibbs     sampler,     we    use     the     idea    of     Section    2.3     and 

introduce   latent  variables      that      decompose      the      numbers      of      censored 

entries    into       the      numbers       of       observations       belonging    to    individual 

intervals.       Let   Z,  .,    Z„  .,  ...,Z..  denote   the   random   variables    that   count 
lj        2j  jj 

the  number  of  observations  in  ja  .  that  might  fall  in  the  intervals  (0, 
t.l.Ctj,     t   ] (t .  -,     t  ],    respectively,       so      that         u.   =  J/         Z ... 

Further,     let    Z...  ...... Z     .  .    denote    the    number    of    observations     in    X. 

j+lf  m+lj  j 

that  might  fall  in   the   intervals   ( t  .,  t.  4] (t  ,,  t  ],   (t  ,»], 

j       j+l  JD-1        m  in 

respectively,    so  that  A  .  =  Jj  Z.  .. 

J         J=J-H     J 

Our  objective  is  to  summarize,  via  samples  generated  form  the  Gibbs 
sampler,  the  posterior  distribution  of  8  given  the  data.  The  posterior 
full  conditional  for  8  given  the  Z's  and  the  data,  is  easily  seen  to  be 
an  up-dated  Dirichlet  distribution  depending  only  on  the  Z's.  The 
posterior  full  conditional  for  the  Z's  given  8  and  the  data,  is  easily 
seen  to  be  a  product  of  multinomial  distributions.  Thus,  suppose  at 
the    ith    iteration    step    of    the    Gibbs    sampler,     we       have    the    realization 

8*  =    (ef,    8^ 81    ,),    with  jf*1   e\     *      1.         We     then      up-date      the     Z 

l       d  m+l  1—1      ■* 

variables  from  the  multinomial  distributions  as  follows.  For  each 
j,    j  =    l,...,m,    we  sample   Z.      , .  .  .  ,Z         from   the   multinomial      distrlbu- 


tion  with  sample   size  ji .  and   parameters      r\  . r..,       where      r .  .  =   6. 

/  J^       ej  for   J  =    1 j.       Similarly,    we  sample  zj*j Z£\f    from 

the   multinomial    distribution      with      sample      size      A  .      and      parameters 

rUj w  where  rl>  ■  8i  /  *£*,  ei for  J  *  J  +  1 ■**• 

Having  sampled  the  Z  random  variables,  we  then  generate  new  G 
variables  from  the  Dirichlet  distribution  as  follows.  We  compute,  for 
each  i,    i  =  1, . .  .  ,n  +  1, 


V  -  •,  ♦  -i  ♦  1  #  • 


and  then  sample  (9,   ,...,0   ,  9  ,)  from  the  Dirichlet  distribution 

l  m  jzh*1 

with  parameters   (Y.      , .  . .  , Y     .). 
l  nt+l 

By    running    M    parallel     independent    replications    of    the    sampler,     after 

the   ith    iteration,    we   have   9,    ,    9„   ,...,9     ,      ,    and   YA    ,...,Y     ,      ,    for 

Is.      2s  m+l,s  is  m+l,s 

s  =    1,..,M.       The  posterior  distribution  of  9.  for      i  =    1 ..... m  +    1   can 
then  be  approximated   (for  sufficiently  large  i)   by 

_       M  iin-1 

F(62|data)     =     «        £    BetaiY^,     Y    Y^     • 

s=l  k*J 

where  Beta(a.b)  denotes  the  beta  density  with  parameters  a  and  b.    A 
posterior  estimate  of  the  9.  is  then  given  by 


0 


i-  "-1  I 


M  Yl 


-1     r  Is 

LJ«1     is 


Other  posterior  summaries  can  be  computed  similarly  from  the  replicated 
samples,  i  and  M  having  been  selected  to  achieve  "convergence"  to 
"smooth"  estimates. 

4.4  A  NUMERICAL  EXAMPLE 

To  illustrate  the  Gibbs  sampler  technique,  we  shall  reanalyze  the  data 
set  given  by  Kaplan  and  Meier  (1958).  The  data  consist  of  deaths 
occurring  at  .8,  3.1,  5.4  and  9.2  months  and  losses  occurring  at  1.0, 
2.7,  7.0  and  12.1  months.  For  comparison  purposes,  we  consider  the 
same  prior  specifications   used   by  Susarla  and   Van  Ryzin    (1976)    in   their 


Bayesian  reanalysis   of   the  data.      That   is,    ^n(*)    ■    1    ~   ©  with     4>     ~ 

.  12  and  N  =  4,8,    and   16. 

To    apply    the    Gibbs    sampler    approach,     we    divide    the    positive    real    line 

into   the  following      intervals:       (0,    .8~],     (.8~,    .8],     (.8,    1],     (1,    2.7], 

(2.7,    3.1~],        (3.1",    3.1],        (3.1.    5.4"],  (5.4~,     5.4],        (5.4,    7], 

(7,    9.2"],    (9.2",    9.2],     (9.2,     12.1],    and       (12.1,    »).       We    label    these 
intervals   by    (0,     t-],     it.,    tg] (*12>     i     ],    and    let   By    &2 and 

G._,    respectively,    denote  the  probabilities  assigned  to  the  intervals. 

The  likelihood  of  0  is  proportional  to 

KG)    =    e2e6G8en(G4  +  e5+...+  e13)    x 

(e5+-+ei3)(ei0+-+ei3)ei3      • 

-it       -it 

Let  a  =  N(e  -  e         ) ,  so  that  the  prior  distribution  of  G  is 

13       bc-1 


n-/ 


n(6)    =    c 

J=l 

where  C  is  the  normalizing  constant. 

Note   that  G0,    G0,    G0,    6, ,    and   G,_    in   the    likelihood   combine   simply   with 
c.         b         o         11  1 J 

the    corresponding    G    variables     in    the    prior    distribution,     so    that    the 

parameters    G0,     G_,     G0,     G, ,     and    0,o    are    each    up-dated    by    1     in    the 
c.  b  o  11  1 J 

posterior     distribution.         Therefore,     we     need     only     introduce     three     Z 

variables   for   the    incomplete  data,       namely,       Z  =    (Z. ,,    Z_ Z,_   ,), 

1  41  51  Id,  1 

Z2   =    (Z52*    Z62 Z13,2}'    and  Z3   "    (Z10,3*    Zll,3'    Z12,3*    Z13,2*'       We 

then   sample   Z.,    for   j  =    1,    2,    and   3,    from    the   appropriate    multinomial 

distribution  with  sample  size  1  and  rescaled  probabilities. 

To    estimate    the   survival    function   at    t  .,    we    accumulate    the    6     for    i    > 

j.       For    t   between    t     and    t       ,    an    interpolation   formula      that   connects 

the  survival  function  at  the  two  end  points  according  to  the  prior 
shape  can  be  used.  Tables  1  and  2  exhibit  the  Gibbs  sampler  results 
for    the    survival   function   evaluated    at    t      with    M   =    1000    and    M   =    4000, 

both  with  i  =  10.  The  exact  Bayes  solutions  given  by  Susarla  and  Van 
Ryzin  are  also  listed  for  comparison.  The  tables  show  that  the  Gibbs 
sampler  results   for  M  ■    1000   are   already  very      accurate      in      approxima- 


ting  the  exact  Bayes  rules.  Similar  results  hold  for  N  =  16.  For 
further  illustration  of  the  Gibbs  sampler  methodology,  see  Kuo  (1991), 
who  reanalyses  data  from  Turnbull  (1974). 


Table  1:    Gibbs  Approximation  to  the  Bayes  Estimates  for  N  =  4 
Statistics  \  age(t)  .8~  .8  1  2.7       3.  l"         3.1 


Pt   with  M   =  1000 

.970 

.886 

.879 

.819 

.805 

.702 

Pt   with  M   =  4000 

.970 

.886 

.879 

.819 

.805 

.701 

Exact  Bayes 

.970 

.886 

.879 

.819 

.805 

.701 

Statistics  \  age(t)  5.4  5.4  7  9.2  9.2  12.1 

P    with  M  =   1000  .632  .529  .491  .437  .305  .253 

P     with  M  =  4000  .      .632  .529  .491  .438  .307  .256 

Exact  Bayes  .632  .528  .490  .438  .306  .255 


Table  2:    Gibbs  Approximation  to  the  Bayes  Estimates  for  N  =  8 
Statistics  \  age(t)  .8-  .8  1  2.7       3.  1~         3.1 

. 792        . 773        . 698 

. 792   . 773   . 700 
. 793   . 773   . 699 

Statistics  \  age(t)  5.4"         5.4  7         9.2~         9.2        12.1 


?t   with  M   =  1000 

.954 

.892 

.881 

Pt  with  ft  =  4000 

.954 

.892 

.881 

Exact  Bayes 

.954 

.892 

.881 

Pf  with  M   =  1000 

.600 

.527 

.474 

.405 

.316 

.249 

?t   with  M   =  4000 

.602 

.529 

.474 

.405 

.318 

.250 

Exact  Bayes 

.602 

.528 

.474 

.405 

.318 

.250 
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